Knowledge-Based Visual Question Answering Using Multi-Modal Semantic Graph

نویسندگان

چکیده

The field of visual question answering (VQA) has seen a growing trend integrating external knowledge sources to improve performance. However, owing the potential incompleteness and inherent mismatch between different forms data, current knowledge-based (KBVQA) techniques are still confronted with challenge effectively utilizing multiple heterogeneous data. To address this issue, novel approach centered on multi-modal semantic graph (MSG) is proposed. MSG serves as mechanism for unifying representation data diverse types knowledge. Additionally, reasoning model (MSG-KRM) introduced perform deep fusion image–text information sources. development involves extracting keywords from image object detection information, text, texts, which then represented symbol nodes. Three graphs constructed based graph, including vision, question, non-symbol nodes added connect these three independent marked respective node edge types. During inference stage, embedded into feature through embedding methods, type-aware attention module employed reasoning. final answer prediction blend output pre-trained model, pooling results, characteristics non-symbolic experimental results OK-VQA dataset show that MSG-KRM superior existing methods in terms overall accuracy score, achieving score 43.58, improved most subclass questions, proving effectiveness proposed method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Constraint-Based Question Answering with Knowledge Graph

WebQuestions and SimpleQuestions are two benchmark data-sets commonly used in recent knowledge-based question answering (KBQA) work. Most questions in them are ‘simple’ questions which can be answered based on a single relation in the knowledge base. Such data-sets lack the capability of evaluating KBQA systems on complicated questions. Motivated by this issue, we release a new data-set, namely...

متن کامل

Multi-Modal Question-Answering: Questions without Keyboards

This paper describes our work to allow players in a virtual world to pose questions without relying on textual input. Our approach is to create enhanced virtual photographs by annotating them with semantic information from the 3D environment’s scene graph. The player can then use these annotated photos to interact with inhabitants of the world through automatically generated queries that are gu...

متن کامل

Knowledge-Based Question Answering

This paper describes the Webclopedia Question Answering system, in which methods to automatically learn patterns and parameterizations are combined with hand-crafted rules and concept ontologies. The source for answers is a collection of 1 million newspaper texts, distributed by NIST. In general, two kinds of knowledge are used by Webclopedia to answer questions: knowledge about language and kn...

متن کامل

Question Answering System Using Semantic Dependency Tree and State Graph

The basic architecture of a Question Answering System (QAs), based on Natural Language Processing, subsumes question analysis and answer extraction. The paper presents a system which is based on semantic analysis, relates the words logically and provides an admissible answer to the user query. Instead of using template based query, it accepts questions phrased in various forms. The question is ...

متن کامل

Knowledge Based Question Answering

The n a t u r a l language d a t a b a s e query system i n c o r p o r a t e d in the KNOBS i n t e r a c t i v e p l a n n i n g sys t em compr i ses a d i c t i o n a r y d r i v e n p a r s e r , APE-II , and s c r i p t i n t e r p r e t e r which y i e l d a c o n c e p t u a l dependency c o n c e p t u a l i z a t i o n as a r e p r e s e n t a t i o n of the manning of u s e r i n p u ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Electronics

سال: 2023

ISSN: ['2079-9292']

DOI: https://doi.org/10.3390/electronics12061390